Reference-based Super-Resolution (Ref-SR) has recently emerged as a promising paradigm to enhance a low-resolution (LR) input image or video by introducing an additional high-resolution (HR) reference image. Existing Ref-SR methods mostly rely on implicit correspondence matching to borrow HR textures from reference images to compensate for the information loss in input images. However, performing local transfer is difficult because of two gaps between input and reference images: the transformation gap (e.g., scale and rotation) and the resolution gap (e.g., HR and LR). To tackle these challenges, we propose C2-Matching in this work, which performs explicit robust matching crossing transformation and resolution. 1) To bridge the transformation gap, we propose a contrastive correspondence network, which learns transformation-robust correspondences using augmented views of the input image. 2) To address the resolution gap, we adopt teacher-student correlation distillation, which distills knowledge from the easier HR-HR matching to guide the more ambiguous LR-HR matching. 3) Finally, we design a dynamic aggregation module to address the potential misalignment issue between input images and reference images. In addition, to faithfully evaluate the performance of Reference-based Image Super-Resolution under a realistic setting, we contribute the Webly-Referenced SR (WR-SR) dataset, mimicking the practical usage scenario. We also extend C2-Matching to Reference-based Video Super-Resolution task, where an image taken in a similar scene serves as the HR reference image. Extensive experiments demonstrate that our proposed C2-Matching significantly outperforms state of the arts on the standard CUFED5 benchmark and also boosts the performance of video SR by incorporating the C2-Matching component into Video SR pipelines.
translated by 谷歌翻译
Federated learning (FL) enables distributed model training from local data collected by users. In distributed systems with constrained resources and potentially high dynamics, e.g., mobile edge networks, the efficiency of FL is an important problem. Existing works have separately considered different configurations to make FL more efficient, such as infrequent transmission of model updates, client subsampling, and compression of update vectors. However, an important open problem is how to jointly apply and tune these control knobs in a single FL algorithm, to achieve the best performance by allowing a high degree of freedom in control decisions. In this paper, we address this problem and propose FlexFL - an FL algorithm with multiple options that can be adjusted flexibly. Our FlexFL algorithm allows both arbitrary rates of local computation at clients and arbitrary amounts of communication between clients and the server, making both the computation and communication resource consumption adjustable. We prove a convergence upper bound of this algorithm. Based on this result, we further propose a stochastic optimization formulation and algorithm to determine the control decisions that (approximately) minimize the convergence bound, while conforming to constraints related to resource consumption. The advantage of our approach is also verified using experiments.
translated by 谷歌翻译
Text-guided image editing can have a transformative impact in supporting creative applications. A key challenge is to generate edits that are faithful to input text prompts, while consistent with input images. We present Imagen Editor, a cascaded diffusion model built, by fine-tuning Imagen on text-guided image inpainting. Imagen Editor's edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training. In addition, Imagen Editor captures fine details in the input image by conditioning the cascaded pipeline on the original high resolution image. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting. EditBench evaluates inpainting edits on natural and generated images exploring objects, attributes, and scenes. Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.
translated by 谷歌翻译
Neural language models (LMs) have achieved impressive results on various language-based reasoning tasks by utilizing latent knowledge encoded in their own pretrained parameters. To make this reasoning process more explicit, recent works retrieve a rationalizing LM's internal knowledge by training or prompting it to generate free-text rationales, which can be used to guide task predictions made by either the same LM or a separate reasoning LM. However, rationalizing LMs require expensive rationale annotation and/or computation, without any assurance that their generated rationales improve LM task performance or faithfully reflect LM decision-making. In this paper, we propose PINTO, an LM pipeline that rationalizes via prompt-based learning, and learns to faithfully reason over rationales via counterfactual regularization. First, PINTO maps out a suitable reasoning process for the task input by prompting a frozen rationalizing LM to generate a free-text rationale. Second, PINTO's reasoning LM is fine-tuned to solve the task using the generated rationale as context, while regularized to output less confident predictions when the rationale is perturbed. Across four datasets, we show that PINTO significantly improves the generalization ability of the reasoning LM, yielding higher performance on both in-distribution and out-of-distribution test sets. Also, we find that PINTO's rationales are more faithful to its task predictions than those generated by competitive baselines.
translated by 谷歌翻译
面向目标的生成脚本学习旨在根据目标生成后续步骤,这是帮助机器人进行日常生活的刻板印象活动的重要任务。我们表明,如果历史状态不仅被给人的语言指示捕获,而且还可以增强随附图像提供的其他信息,可以提高此任务的性能。因此,我们提出了一项新任务,多媒体生成脚本学习,以通过跟踪文本和视觉方式中的历史状态,并介绍包含2,338个任务和31,496个步骤的第一个基准,从而生成后续步骤。我们旨在生成视觉状态的脚本,这些脚本是可跟踪的,对看不见的任务的诱导性,并且在各自的步骤中多样化。我们建议通过多媒体选择性编码器编码视觉状态更改,并使用检索仪的解码器从先前观察到的任务中转移知识,并通过优化面向多样性的对比度学习目标来在每个步骤中介绍不同的信息。我们定义指标以评估发电质量和电感质量。实验结果表明,我们的方法明显优于强质基线。
translated by 谷歌翻译
随着网络技术的快速发展和网络设备的快速增长,数据吞吐量也大大增加。为了解决蜂窝网络中回程瓶颈的问题并满足人们对延迟的要求,基于预测的结果,网络体系结构等网络体系结构旨在主动将有限的流行内容保持在网络边缘。同时,内容(例如,深度神经网络模型,与Wikipedia类似知识库)和用户之间的相互作用可以视为动态二分图。在本文中,为了最大程度地提高缓存命中率,我们利用有效的动态图神经网络(DGNN)共同学习嵌入了两部分图中的结构和时间模式。此外,为了更深入地了解不断发展的图表中的动态,我们提出了一个基于信息时代(AOI)的注意机制,以提取有价值的历史信息,同时避免消息陈旧的问题。结合了上述预测模型,我们还开发了一种缓存选择算法,以根据预测结果做出缓存决策。广泛的结果表明,与两个现实世界数据集中的其他最先进的方案相比,我们的模型可以获得更高的预测准确性。命中率的结果进一步验证了基于我们提出的模型而不是其他传统方式的缓存政策的优势。
translated by 谷歌翻译
我们表明,诸如Stylegan和Biggan之类的预训练的生成对抗网络(GAN)可以用作潜在银行,以提高图像超分辨率的性能。尽管大多数现有面向感知的方法试图通过以对抗性损失学习来产生现实的产出,但我们的方法,即生成的潜在银行(GLEAN),通过直接利用预先训练的gan封装的丰富而多样的先验来超越现有实践。但是,与需要在运行时需要昂贵的图像特定优化的普遍的GAN反演方法不同,我们的方法只需要单个前向通行证才能修复。可以轻松地将Glean合并到具有多分辨率Skip连接的简单编码器银行decoder架构中。采用来自不同生成模型的先验,可以将收集到各种类别(例如人的面孔,猫,建筑物和汽车)。我们进一步提出了一个轻巧的Glean,名为Lightglean,该版本仅保留Glean中的关键组成部分。值得注意的是,Lightglean仅由21%的参数和35%的拖鞋组成,同时达到可比的图像质量。我们将方法扩展到不同的任务,包括图像着色和盲图恢复,广泛的实验表明,与现有方法相比,我们提出的模型表现出色。代码和模型可在https://github.com/open-mmlab/mmediting上找到。
translated by 谷歌翻译
测量视觉内容的感知是计算机视觉中的一个长期问题。已经开发了许多数学模型来评估图像的外观或质量。尽管此类工具在量化诸如噪声和模糊水平之类的降解方面具有有效性,但这种量化与人类语言松散结合。当涉及到对视觉内容感觉的更抽象的看法时,现有方法只能依靠受监督的模型,这些模型是通过经过艰苦的用户研究收集的标记数据明确培训的。在本文中,我们通过探索在对比的语言图像预训练(剪辑)模型中探索丰富的视觉语言,超越了传统的范例,以评估质量感知(外观)和抽象感知(感受)的图像中的零 - 示意。特别是,我们讨论有效的及时设计,并展示有效的及时配对策略来利用先验。我们还提供对受控数据集和图像质量评估(IQA)基准测试的广泛实验。我们的结果表明,剪辑捕获了有意义的先验,可以很好地推广到不同的感知评估。代码将在https://github.com/iceclear/clip-iqa上可用。
translated by 谷歌翻译
已知大气湍流的图像恢复算法对设计比模糊或噪声等传统湍流更具挑战性,因为湍流引起的失真是空间变化的模糊,几何变形,传感器噪声的纠缠。现有的基于CNN的恢复方法建立在具有静态重量的卷积内核上,不足以处理空间动态的大气湍流效果。为了解决这个问题,在本文中,我们提出了一个以物理启发的变压器模型,用于通过大气湍流进行成像。提出的网络利用变压器块的功率共同提取动态湍流失真图并恢复无湍流图像。此外,我们认识到缺乏全面的数据集,我们收集并介绍了两个新的现实世界湍流数据集,这些数据集允许使用经典目标指标(例如PSNR和SSIM)进行评估,并使用文本识别精度进行了新的任务驱动指标。实际测试集和所有相关代码都将公开可用。
translated by 谷歌翻译
在过去的25年中,我们目睹了机器学习在编译器领域的广泛应用。选择和相位订购问题。但是,有限的作品已在最先进的编译器(即LLVM)上游,以将前者无缝集成到编译器的优化管道中,以便由用户容易部署。 MLGO是此类项目的第一个项目之一,它仅努力使用强化学习使用基于ML的INLINER来减少二进制的代码大小。本文介绍了mlgoperf;第一个端到端框架,能够使用LLVM的ML Inliner优化性能。它采用二级ML模型来生成用于训练重新定位的增强学习代理的奖励,该辅助剂以前由MLGO用作主要模型。它通过预测分析功能的函数的速度加速来做到这一点,并为主要模型提供快速训练框架,否则将是不切实际的。实验结果表明,MLGOPERF在LLVM在O3时的优化方面的优化分别为SPEC CPU2006和CBENCH基准分别获得了1.8%和2.2%。此外,提出的方法为我们的基准测试带来了自动点守则区域的26%,可以将其转化为额外的3.7%速度值。
translated by 谷歌翻译